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^= (57) Abstract: A method and system 
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distributed databases. The system includes 
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phrases in a user's query, a stemmer (24) 
to return the base word in each query term, 
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synonyms for each query term. The method 
includes the steps of processing a query to 
generalize and expand the query to return 
as many relevant terms to the user, receiving 
from the user selected terms which the 
user expects to find in attributes of the 
distributed databases and searching the 
directories of the distributed databases using 
a Lightweight Directory Access Protocol 
(LDAP) (30). A query executor or mediator 
(32) retrieves data from the specified 
databases in accordance with SQL code. 
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METHOD AND SYSTEM FOR UNIVERSAL QUERYING OF 
DISTRIBUTED DATABASES 



FIELD OF THE INVENTION 
The present invention generally relates to a system and method for searching target 
databases, and in particular, to a system and method for universal querying of distributed 
databases. 

BACKGROUND OF THE INVENTION 
Numerous independently owned collections of dato are being created and maintained 
all over the world. The number of active and legacy data sources almost guarantees that part 
or all of a query can be answered using one of these countless databases. However, there exist 
several intermediate steps between posing a query and receiving an answer that make the task 
of querying others' databases ahnost impossible for the average user. First, the user must 
locate a relevant data source. Then he or she must gain access to the source, pose the query 
using table names and attribute names from that target database, and finally must decide which 
of the returned data, if any, is relevant to the query. Users' queries must be formatted 
correctly, either using structured query language (SQL) code, or using formatted blocks of 
code (i.e., code generated by a back-end process based on user-filled selection boxes and text 
fields). 

While this list of steps is formidable, the process of querying is even more difiScult if 
multiple databases must be consulted to obtain a complete answer. Not only must the above 
steps be executed, but the data from different sources must be joined; and, if there are 
discrepancies, the user must decide which source is more reliable. In integrating the data, 
users must first understand elements of each database's schema so that corresponding fields 
between databases can be identified. Even once corresponding fields have been located, user 
must consider both the relative accuracy of the sources and the timeliness of the data contained 
within the sources. For example, data in a five (5) year old database would obviously be less 
relevant to data in a current database if a Department of Defense (DoD).member is querying 
about current troop movements. 

There are even more basic problems standing between the user's query and an 
answering data set. Databases are created with a particular task in mind. The database may 
be tailored for ease of asking particular types of queries, for ease of storing new data, or for 
storing groups of attributes as an object. Designing databases for specific purposes allows 
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to » be s»red .„d rcmeved efficientiy for particuUrusk and po^bly a few stated 
^ However, ftis makes i. .early impossible u, retrieve information for other unrelated 
^ in looking a. the task of querying ftom this perspective, i, ean be seen that *e most 
fimdamentd <,tenring problem is that groupings of objects that make sense in one database 
representation, make it dtfBcul. to regroup attributes to form objects meaningful to a qt^ery 
t^tatedtothedatabase-sspecificpurpose. Forexamplc. consider thedatabasctablesbelow 

which have b«=n excerpted ftom a hypothetical company's relational database-. 



Employee 



Acquisitioa Ag ent 



£mployee_ID 



Social_Sec_# 

Salary 

Title 



OccupationCode 



Salary_Band_A 
Salary_Band_B 
Band A_Max_PC 
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15 



20 



25 



Tables and attributes from a hypothetical company dJabas^-^^^L^ 

^^?oy«" «ble has key Employee JD -"/""^r't^^^ts "t^ 
o 1 Title The "Acquisition Agent" table nas Kcy 

S'a.iotco™and aS,u,es slry.Bai;..*. Sa.ary.Band.B.and 
Band_A_Max_PO. 

A division of dus hypothetical company has a database that keeps track of its employees. 
■n« database has a table. "Employee." that contains basic informadon such as name, socal 
security number, salary, and jobtiUe. The key to this table is Employe.^. TTedntabase 
also has individual tables relating to each job tide wiftin the company. Tltese tables note 
the occupation-s salary ranges (eg., Salary.Band.A) and the specific duties a. each sata^ 
level (..g.. Band A Max.PO). For example, for an -Acquisition.Agent.- the salary l»nds 
are A, B, etc.. and tie maximum amount ^ an todividual in salary band A may pnchase 
isBami A Max PO. This table's key is Occupation.Code. " A reasonable quety ftom 
another'divlsion'of .he company could be "Retum the individuals who can purchase more 
than 5000 units of product X." Oiven the abovewo tables ftom flte database, we can see 

that the query will be difficult to execute. First, the individual asking the query would 
have to know that Acquisition.AgentandBuyerw^ synonymous. Kext.ajom on satay 

would need to be executed, bu, there is no common key. Finally, math wo^d have to be 
p^rfi^ to tran^ate between the maximum purchase order allowed (Band.A M»..PO) 
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and the number of units of X a specific buyer could purclr«se. This seemingly simple 
query requires a great deal of database-specific knowledge. 

From the above discussion it is clear that there can be a number of issues 
encountered in trying to retrieve data from an unfamiliar source or sources. There is the 
5 initial task of locating relevant data sources. Even once this has been accomplished, the 
problem of answering the query becomes no easier. Issues range from the banal, but 
nontrivial, task of gaining access privileges, to the more theoretical and complex tasks of 
regrouping of attributes to form real- world entities (i.e., the attributes within a table must 
be understood as representations of actual physical objects). Several potential obstacles 

10 are discussed below. 

The first potential obstacle concerns gaining access to the relevant data source. 
This involves being allowed to read the database schema and the data contained within the 
database. Additionally, it may require the ability to store intermediate tables. When a 
large, multi-step query with several joins or cross products is carried out, the intermediate 

15 tables generated need to be temporarily stored. If systems accessing the database are 

remote, it is clearly impractical to transmit these larger data sets to the querying machine. 
Thus, some local write space may be desired. 

A second potential obstacle concerns the fact that each database in the system may 
have been designed for efficiency for a system-specific task. Databases are created to fit 

20 within larger systems. These systems have certain storage and retrieval requirements, as 

well as baseline assumptions about data format. No matter how general a database schema 
is developed, the schema must operate within the system and data requirements. This 
necessarily means there are queries the system will have difficulty answering. 

A third potential problem is that poorly labeled tables and attributes can make it 

25 impossible to determine the real-world object being represented. Examples of table names 
extracted from actual DoD data sources include: " SUDOl , VNNZ, SYFA, and WUCl . 
Examples of attribute names extracted from the same DoD source include: SC, TCN. 
FROM_PPCl, and PRIME. Without the aid of documentation or the original database 
designers, it is impossible to know what physical objects are represented by these tables. 

30 Thus, data corresponding to a user's query is forever lost because a user or an automated 
system will be unable to identify all relevant data. 
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The foutlK potential problem in trying answer a query is that documentation is 
typically scarce and may not be any less cryptic than the database objects themselves. 
AdditionaUy. original database designers may have forgotten what the objects represent, or 
they may have moved onto other sites. Users are left to map between database schema and 
5 real-world objects to the best of their ability. 

If the average user is able to overcome these obstacles and retrieve data from 
several data sources, he must then combine the responses into a coherent solution set This 
compilation may involve conflict resolution among data rows. In some situations, it may 
be acceptable to return both data items and allow the user to decide which data item is 
10 more reliable. Consider however a fictitious military example. Two different databases 

return different locations for the same enemy tank. One location is very close to a US 
Army base, and the other set of coordinates places the tank much farther away. How 
should the Anny General querying the system react? Should he or she assume the tank is 
close and ready the troops, and thus risk looking as if the base is preparing for miUtary 
action? Or should the CSeneral not mobUize troops and risk being unprepared for an enemy 
attack? If the General does not know which data is more accurate in this case, then it is 
very difficult to determine which results are correct and what action to take. 

In addition to the issues discussed in the previous section, attempting to locate 
relevant databases and achieve accurate query responses in a military environment can be 
even more difficult. For example, not only does the user need to gain access to a database, 
but he or she must typically have the appropriate clearance level to see every row and 
column of the data returned. An even bigger obstacle to overcome is the fact that 
terminology across branches of the military is not always consistent. First, the same term 
may have different meanings in different divisions of the military (e.g.. rank has different 
25 meanings across govermnent military components). Second, the same object (e.g., a 20- 

foot antenna or a type of ammunition) can have different names in different branches. The 
first issue leads to a problem in query interpretation while the second creates a problem m 
retrieving data across databases. 
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SUMMARY OF THE INVENTION 
Accordingly, it is an object of the present invention to provide a system and 
methodology for querying distributed databases. 

It is another object of the present invention to provide a query system and method 
adapted to process unstructured queries. 

It is a further object of the present invention to provide a querying system and 
method which allows a user to retrieve data from a database as soon as such database is 
introduced into the system without causing the system to be halted or rebooted. 

It is still another object of the present invention to provide a querying system and 
method which aids in the generation of mediators. 

It is stni another object of the present invention to provide a querying system and 
methodology which does not utilize a Shared representation. 

Generally, the system and method of the present invention accomplishes one or 
more of the above-noted objects of the present invention by providmg an architecnire which 
allows users to enter unstructured queries, expands and generalizes such queries, and 
matches the queries to actual target database tables. The method of the present invention 
generally includes the steps of processing a query (e.g.. an unstructured query) to generalize 
and/or expand the query to return as many relevant words or terms as possible to the user, 
receiving froin the user selected words or terms which the user expects to fmd in attributes 
of the distributed databases, and searching a database structure (e.g., an annotated database) 
having directories extracted from target distributed databases, the directories includmg table 
names, attribute names, sample data, and/or. if available, data dictionary information. Of 
importance, the step of searching the database structure includes utilizing a Lightweight 
Directory Access Protocol (LDAP). which allows quick access to information directories. 
Since LDAP directories are designed for reading data rather than updating or adding new 
data to the directories, the retrieval speed of information conned within the directories 
(e.g., table names and table attributes) is very fast. 

m one aspect of the method of the present invenUon, the step of processing a query 
includes the step of receiving at least a first query from a user or client, the first query 
including at least a first term and the steps of identifying key terms and generalizing and/or 
expanding the first query to enhance the likelihood of retrieval of relevant data to the user. 
In this regard, the step of processing at least the first query may include the step of 
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identifying or o^Vracting key words or terms from the firs\ ^^uery, such as the first term, 
since such key words may correspond to an attribute or table name in the target distributed 
databases. In one embodiment, the step of extracting key words includes the step of 
extracting at least a first noun and/or a first noun phrase from the first query. In another 
5 embodiment, the step of extracting k^y words comprises the step of extracting at least a first 

verb from the first query. In yet another embodiment, the step of extracting key words 
comprises the step of extracting at least a first data item (e.g., part number) in the first 
query. In order to further generalize the first query in order to enhance the chances of 
capturing relevant information from target databases, tiie step of processing at least the first 

10 query comprises the step of stemming at least a first term in the first query, such that at least 

a first root word corresponding to the first term may be utilized in the final search. The 
processing step may also include the step of generating at least a first synonym of at least 
the first term of the first unstructured query to expand tiie scope of the search. The step of 
processing at least the first query may be facilitated by presenting to the user an initial user 

15 query screen, whereby the user is afforded an opportunity to perform various options.- 

including perform stenuning, include synonyms, include acronyms, and/or perform wild 
card substitutions. 

Once such nouns, noun phrases, verbs, number^, synonyms, acronyms, and/or 
related terms are retrieved and/or generated, the processing step may include the step of 

20 presenting such terms to die user in an expanded or refmed user query screen format. Such 

relevant words (e.g., noims, noun phrases, verbs, numbers, and/or synonyms) may be 
presented to the user or client to afford the user die opportunity to select the returned 
relevant terms (e.g., gathered nouns, noun phrases, synonyms, acronyms, data items and/or 
related items) which the user believes useful in searching the target distributed databases. 

25 As a result, the user is able to select or collect terms for which the database schemae will be 

searched. 

In order to facilitate subsequent searches by a user, the step of processing the first 
unstructured query from the user may fiirther include the step of ranking selected relevant 
terms (e.g., synonyms or other related terms). In tiiis regard, if a term is selected, the rank 
30 for such term is increased and, conversely, if a term is not selected, the rank for such term is 

decreased. Additionally, the methodology of the present invention is adapted to leam from 
the structure of die users' queries. In this regard, if query terms frequently occur together. 
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when a user submits only one of these terms, infonnation regarding both terms may be 
rewmed to the user to save the user time. Such learning capability may be included >vithin 
the LDAP. Conversely, if certain synonyms are not frequently selected, such synonyms will 
not be returned to users in the future. 
5 As noted hereinabove, the method generally includes the step of searching a 

database structure, such as an LDAP directory which may include attributes, table names, 
sample data and/or data dictionary information in the target distributed databases, for 
attributes and/or table names that match the terms selected by the user (e.g., augmented 
query terms) and presenting such information to the user. Such attributes and/or table 
10 names may be retrieved, along with the remaining attributes for tables that had matching 

attribute names. A first tree may be constructed, whereby query term folders are populated 
with database folders containing the tables that match the augmented query terms and such 
tree is presented or returned to the user. Such folders are labeled with the query term or 
terms that correspond to the matching tables contained in them. The methodology of the 
15 present invention may further include the step of processing a fmal query from the user. In 
one embodiment, the step of creating a final quety comprises creating a pictorial query for 
the user, whereby the user is allowed to add constraints and/or joins to produce a final 
query. The step of processing the final query further includes the step of automatically 
generating SQL code corresponding to the final query and utilizing a mediator to forward 
20 the query to appropriate databases, receive data from each of the appropriate databases, and 
remming such data to a servlet. where such data is formatted and presented to the user. 

In another aspect, the present invention relates to a system for processing at least a 
first query to retrieve data relevant to the first query from at least a first of a plurality of 
distributed or target databases. Generally, the system of the present invention includes a 
25 computer system for at least receiving a first query from a first user, an extractor for 

identifying key words, such as nouns, noun phrases, verbs of numbers in the first queiy, a 
database stnicture, such as an LDAP directory, including at least one of a plurality of table 
names and attributes relating to tables within the distributed databases, the directory bemg 
searchable to provide the user, via the computer system, with at least a first database table 
30 name and attributes associated with retrieved tables corresponding to the retrieved table 

names, and a code generator for generating SQL code based upon retrieved table names 
and/or attributes selected by the user. In order to enhance the search, the system may 
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15 



25 



further include a <^uery generalizer for processing the fir5+ <^ery to provide or return to the 
user via the computer system terms related to at least a first term of the first query to enable 
the user to select terms the user expects to find in distributed database tables. For purposes 
of facilitating searching, the system may fiirther include a learning program, whereby 
information about which synonyms a particular type of user will need and which terms 
often appear together in these queries is stored and/or ranked, hi this regard, after the user 
enters a query and chooses relevant synonyms, the rank of temis is updated. If terms are 
selected, the rank of such terms is increased and conversely, if temis are not selected, the 
rank of such terms is decreased. Further, the system is adapted to learn from the structure 
of the users' queries. In this regard, if terms firequently occur together, then when the user 
asks only about one of these terms, the system will return infomiation about those terms to 
the user. The system may fiirther include a central mediator for receiving the query and 
SQL code, via any computer system.1he central mediator in communication with the target 
distributed databases. Such central mediator may be adapted to return the retrieved data 
from the appropriate distributed databases to the computer system, which is capable of 
formatting and presenting such data to the user via. for example, a display screen. 



BRIEF DESCRIPTION OF THE DRAWINGS 
Fig. 1 is a diagrammatic iUustration showing one embodiment of the system of the 
present invention; 

20 Fig. 2 is a diagrammatic illustration showing the architecture of the system 

illustrated in Fig. 1; 

Fig. 3 illustrates a view of displayed textual and graphic information related to 

entering a query via the screen; 

Fig. 4 illustrates the structure of a sample LDAP directory; 
Fig. 5 illustrates a view of displayed textual and-graphic information related to 

expanding a user's query via the screen; 

Fig. 6 illustrates a view of displayed textual and graphic information related to 

finalizing a user's query via the screen; 

Fig. 7 illustrates a sample SQL query; 

Figs. 8A-8B present a flow chart of one embodiment of the method of the present 
invention; and 



30 
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Figs. 9A-*iD present a flow chart of another embodiment of the method of the 
present invention. 
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DETAILED DESCRIPTION 

Generally, the method and system of the present invention allows users or clients to 
enter an unstructured query that the system expands and generalizes and then matches to 
actual database tables. Users may interact with the system in three different ways: First, the 
5 users may enter an unstructured query, which may be a list of important terms as in a typical 

search query or, alternatively, the query may be a natural language question or sentence. 
Second, users may select the nouns, noun phrases, synonyms, and/or related terms that the 
user expects to appear in the table names and/or attribute or field names of the target 
databases. In this regard, after the unstructured query is received, nouns and/or noun 

10 phrases in the query may be identified, and the query may be generalized and/or expanded 

to return to the user as many relevant words as possible. From these returned words, the 
user may select the terms he or she expects to find in the database attributes and/or table 
names in the system. And third, the user may form Structured Query Language code 
("SQL") by clicking on tables and attributes presented to the user. In this regard, the 
. 15 database matches that the system believes correspond to the query are presented to the user 

and given the tables, the user may form a pictorial query from which the SQL code is 
automatically generated. Thereafter, the data itself may be displayed to the user. 

One embodiment of the system 10 of the present invention is illustrated in Fig. 1. 
The architecture associated with the system of the present invention is illustrated in Fig. 2. 

20 As noted hereinabove, the method first involves an interface with the user to allow the user 

to specify an initial query. The first step in the querying process begins with the user 
entering an initial query and ends with the systems returning nouns, noun phrases, verbs, 
data synonyms, and/or related terms. The purpose of this phase of the querying process is 
to identify key words of the query and to expand and/or generalize the initial query so that 

25 later in tiie process, as many correct tables as possible may be returned to the user. In 

particular, the system 10 of the present invention includes a computer system 16 (e.g., a 
servlet), which generally functions to receive and send data to the appropriate destination. 
In this embodiment, the computer system 16 is adapted to present die user with a query 
input screen 50, illustrated in Fig. 3, which is displayable to the user via a display screen 18, 

30 illustrated in Fig. 1 . Such user query input screen 50 allows the user to enter an 

unsmictured query in the query field 52, illustrated in Fig. 3. Via this user query input 
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screen 50. the ustrmay select which particular generalizaiion and/or expansion functions 
should be utilized in expanding the query, such as perform stemming, include synonyms, 
include acronyms, perform wild card substitution and enable user assisted learning (to be 
described in more detail hereinbelow). In this regard, the system 10 also includes a 
5 noun/noun phrase extractor 22, a stemmer 24 and a synonym generator 26. Tlie noun/noun 

phrase extractor 22, which is commercially available from various vendors, is adapted to 
identify important or key nouns and noun phrases in a User's query. In general, because of 
the way in which queries are phrased, the most important items in a user query are nouns, 
noun phrases, conditionals (e.g., greater than) and numbers. Thus the noun/noun 
1 0 phrase/verb/data extractor 22 searches for only nouns, noun phrases, verbs and data and 

returns to the user, via the computer system 1 6, a list of queryTerms (i.e., identified or 
extracted nouns/noun phrases/verbs/data). The stemmer 24 returns the base word for each 
queryTerm in the returned list. For example, the queryTerm "guns" would be turned into 
"gun" and "buyers" would be turned into "buy." 
15 The system 1 0 is also adapted to expand each term by finding synonyms. In this 

regard, the synonym generator 26, which is commercially available as Princeton'sWordNet 
Lexicon, identifies synonyms for each queryTerm, and those synonyms whose rank is 
greater than the system threshold are collected and returned to the user. Initially, all terms 
are ranked at zero (0). As terms are returned and selected by the user, the rank of such 
20 selected terms increases. If returned terms are not selected by the user, the rank of such 

non-selected terms decreases. When the rank of such non-selected terms falls below the 
user-acceptability level, these terms are no longer returned to the user. In the event user 
assisted learning is enabled or desired by the user, related terms may be retrieved from the 
LDAP 30, such related terms being words that frequently occur in queries with a 
25 queryTerm. In this embodiment, the LDAP directory's learning branches, an example of 

which is illustrated in Fig. 4, may contain the most frequently co-occurring terms for every 
queryTerm. More specifically, as users query the system 10, the choices they make are used 
to leam what types of information should be remmed to them in future queries. Initially, 
the system 10 knows nothing. All users have a blank slate, and all query responses are 
30 equally "good" answers. After users enter queries and choose relevant synonyms, the rank 

of terms is updated. If terms are selected, their rank is increased. Conversely, if terms are 
not selected, their rank is decreased. Only terms whose rank is greater than the system 
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^hold are retu«ed. Addinonal^. sys». 10 is ad.,M » I«n, from O-e s^f 

! rs" ! o„,y abou. one of *.se «n„s. .he sy.en, . 0 »iU i»ft.™„o„ a^- 

1 This saves the use, «n,e fto. having » aa. .«n.s an. 

:La.io„ ™y he a^eaoy provided These co-oecurrence «nns also suggest 

^o.n.a«o„.o.he.erand,hnse.pananser.,ueHes.AU«sin.h,«a«^^^^^ 

^pe of user so d,at differ^,, meanings across differen. endto (e-g.. Depa^nen, of De^ 

^ponenu) n,ay he remem^d. Once .he nonns. noun phrases, verhs. dau, syno^ 

3orre,a.l.LarereMeved,d,e,arefom«.»d(e.g..h,U«eo»pu«rsys^^ 

servlet 1 6) i„.o a Mva Swing Cass conjponen. *a. aiiows da« » be d.splayed u, a »ee 
fo„na.(cai.edaDc,aul,Mu,ah,eTree,andsen..ofl,e«pandednser,uen.scre«. 

mus.ra.ed in Fig. 5, which is displayable «>*e user via disptay devc. 18. 

;t.5iiL».es.heexpanded„ser,ue,y screen 60™HchconUn.^^ 

pbrases. s nony^s and/or re,a«d .enns «hich have been renieved ^ ^-^P'^^-^^ 
fonna. in a left-hand area 62 of *e seen. In flus expanded user query screen 60, »«rs c«. 

Ztir— synonyn.andrela«d.e.n,sassocia«a..ai«^^^^ 
L righ.-hand area 64 o, d,e screen serves as an area .o collec. «nns for wh.ch U« 
X s daubase sche^ae wii, he searched. Users n«y add individual «nns or »me 

r^.«,ele«andarea6..od.eHgh.-handsideofd.escr«n.T.ei.^J^^^ 

64 will he searched for in d,e a..rihu« names of *e «rge. da^hases . ^"^'^'^ 

A t,^ to the screen to continue adding terms to the nght-hand Side 64 
these terms and return to the screen lo „j:«^u«ed 
(Refine,, or may choose .o Submi. d,e ,ue.y. If Refine is chosen, d.e s.^s d.^^ 
Tereh^Lve rela-ing » genera,i..io„ and expansion of .e que. are follo^^^^^^ 

.ree of .enns. synonyms, and re,a.ed .enns is sen. .o user However, " 
Teld. ^e .Is in *e Query box 66 are expanded and generaUzed as ahov. »d *e^ 
L— .reeisno.disp,ayed.o.heuser..ns.ead,a,,n„unsandnounphr^.-^^ 
aau. synonymsand/orre.a«d«m,sareappended.od,eUs.of.e™.sremev«.^n^ngh. 

HaTd ^ 64 of *e expanded user que. screen 60, «,us.a.ed i„ F.g. 5. Onc^ 

submit *ese queryTenns. wild cards may be added .„ .he ^^^^^^^^^ 

^ spaces are replaced wid, wild cards (if d,e queryTenn is a noun pluase). 

LD J siructu^ 30. iUusna«d in Fig. 1. may he searched, and al, anrrhu^s m *e .arge. 
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30 



of *e a»1.m« for <he -aMes U,a. had ma-ching anrib^e na^es. aad a «e .s — ^ 

„a.ch Thia «e is U,e„ .«ur„ed .0 me uacr via *e co.pu«r s,s.e. or servie. . 6 as U,e 
query generating screen 70, illustrated in Fig. 6. .„„:„,„ 

Of in.port.nce, .he LDAP protocol al.o». quicK access to th. ■n^om.t.cn m 
direc.oHes,andprovides=apahimiesa..o«ingr.ater«.ressi„ns«^hes^B^^^^^^ 

J * »oH;r,a data rather than updating or adding new data to the 
directories are designed for reading data ratner uian f 5 ^ . ^ , . ^. . 
directories, the retrieval speed is fas. A santple LDAP stnKture SO .s .1 us«a«d F.g^4. 
The left haif S2 o, the LOAP hierarchy holds da. used in the .earing algonU™ (e. 
retrieval of related tenns and/or synonynts, DaU in these hrant^es .a .so^a.^ 
specifc types of users (e.g., general users. Navy office., etc., The nght half 4 of *e tre 
cLtainsItailsahoutdatahasescontainedinthesystent. Foreachdatah^^^b^ 

at^butes, attribute data types, and whether or no, .he at.tihu« ts a Key >s ^c^J The sp. 
of the LDAP search is coupled with *e regular «.pression ™ ^^^'^^^"^ 
primary driver fo, storing bofl, database structure and learning urfonnatton » J-I"^ 
Isn^hereinabove, the di.ec.ory tree has.wohalvea,a.eaminghalfandada.a^^^ 

,„ the learning half infonnation is s«,red abou. which sy„onyn,s ' 

wil, need and which terms often occur together in Urese queries. In *e da^ba^ half M 

f ,.1 ,h= daabases in a.e sys.em are smred. In flus embodtmen., 
thettee,iiesmicwresof allftedattbasesinu , ..rvAWim Such 

there are four databases in the system: SUPPLY. Rainbow, KVDBA. and KVAWLm^^-" 
databases n«y be extracd ftom structure da.bas« (e.g., Oracle), or any o*.r sunrlar 

~The mird general step in the querying process is directed to crea.ing a flnal q.^^ 
More specificaUy,*isphaseor.e querying processa,lowsus«s.cr.a„ap.c.on^uer, 

and add constraints, view the au.on.a.ical,y generated SQL .ode, su nu. *e q.^ an 
browsett.e resulting data set. As noted hereinabove, the system """^^ _^ 

. T.™ fnlrf,.rs are Dooulated with database folders 

adapted to construct a tree whereby quetyTerm folders are popu 

containing the tables that match. Once the suggested tables arc reamed «, the^ - * 
quety g Jerating screen 70, the user may select tables .on. the left W -J- ^ 
qu^ generating screen 70, adding them to the right-hand worK area 74 M * «^76a. 
76b ar added, Iceys or anribu.es .ha. ma,ch benveen ubies are au.oma.,cally cotmec^d 
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with lines (e.g. ^ joins 78), both of which may be deleted by the user, if desired. The system 
10 is adapted to allow users to create further joins by clicking and dragging a mouse 40, 
illustrated in Fig. 1, between attributes in different tables. Users may also add constraints 
on a specific attribute by clicking the attribute so that it appears in the bottom portion 79 of 
5 the screen, then filling in the rest of the constraint. Once the query is built by the servlet 16, 

the user may view the SQL code in another screen presentable to the user (an example of 
which is illustrated in Fig. 7, and press or select the submit option. The servlet 16 then 
passes the query to a query executor 32 (e.g., mediator) which is commercially available 
fi-om a variety of vendors, the mediator being adapted to retrieve data from the specified 

10 databases in accordance with the SQL code. The data retrieved are then returned to the 

servlet 16, where the data is formatted and presented to the user via the display device 1 8. 
In this embodiment, the query executor or mediator, in accordance with the SQL code, is 
directed to retrieve only data from particular databases, tables and attributes, in view of the 
joins and constraints, if any. No mappings between databases are created. 

15 In another aspect, the present invention is directed to a method for querying 

distributed databases. Generally, and referring to Figs. 8A-8B, in this embodiment, the 
method of present invention includes a step 1 12 of receiving fi-om a user a query (e.g., 
structured query or unstructured query). Thereafter, the method includes a step 1 16 of 
identifying/extracting nouns, noun phrases, verbs and/or data from the query. Thereafter, 

20 the methodology includes a step 120 of sending to the user the enhanced query, at which 

point the user may select terms which will be searched/or any attribute names of the target 
databases. Users are also afforded the opportunity to enter new queries. In this regard, in 
the event a new query is received from the user, the new query may be received and then 
processed as described herein-above! Otherwise, the method includes the step 124 of 

25 receiving the selected terms of the enhanced query from the user. Thereafter, the method 

includes the step 128 of searching the database structure (e.g., LDAP structure) to retrieve 
all attributes in the target databases that match the terms of the enhanced query selected by 
the user. The method then includes the step of retrievi ig the rest of the attributes for the 
tables that had matching attribute names and the step 132 of sending to the user matching 

30 tables in the distributed databases. At this point, the method of the present invention 

allows the user to create a pictorial query and to add constraints. In this regard, the method 
includes the step 136 of receiving selected tables from the user corresponding to the table 

14 
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from which the u^r wishes data to be retrieved from the ..stem databases. Upon rece.pt of 
such selected tables from the user, the method includes the step 140 of generating an SQL 
code and the step 144 of retrieving data from the appropriate target databases and the step 
148 of sending the retrieved data to the user. 
5 Another embodiment of the methodology of the present invention is illustrated in 

Figs 9A.9D. Initially, the methodology includes the step 210 of presenting the user with a 
query input screen, which is illustrated in Fig. 3. As noted hereinabove, the query mput 
screen presents the user with an area to enter the query, and gives the user several options 
such as whether to perform stemming, include synonyms, include acronyms, perform wild 
10 card substitutions, and enable user assisted learning. In this regard, the method further 

includes the step 214 of receiving the initial query from the user and the step 218 of 
extracting nouns and/or noun phrases from the initial query, such extracted nouns/noun 
phrases identified as "queryTerms". In the event the user has requested stemnung to be 
performed, at least a first noun in the initial query or a first que^^Tenn is stemmed (step 
1 5 222) and in the event the user has requested synonyms be included, the method may further 

include the step 226 of generating at least a first synonym relating to the first noun or first 
queiyTerm. In the event the user assisted learning has been enabled or requested by the 
user, the method includes the step 228 of retrieving related terms from the LDAP. a related 
term being a word diat frequently occurs in queries with a queiyTerm. 
20 Once the nouns, noun phrases, verbs, data, synonyms and/or related terms are 

retrieved, the method fiirther includes the step 230 of presenting such terms to the user m an 
expanded user query screen, illustrated in Fig. 5. As noted hereinabove, the expanded user 
query screen allows the user to specify which synonyms and/or related terms correspond to 
the query. In this regard, the method fiirther includes the step 234 of receiving stem ter^, 
25 related terms and/or synonyms selected by the user. The method then includes a step 138of 

ranking the selected and non-selected terms or words and th. step 142 of storing the rank of 
such terms and the related terms for future use (as described hereinabove). Utilizmg the 
queryTerms selected by the user, the method includes the step 246 of searching the LDAP 
for matching attributes and the step 250 of retrieving the matching attributes and associated 

30 table names. 

The method then includes the step 254 of presenting to the user a query generating 
screen, illustrated in Fig. 6. This phase of the querying process allows users to create a 
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pictorial query. a4d constraints, view the automatically g^tcd SQL code, submit the 
query and browse the resulting data set. As noted hereinabove, the query generatmg screen 
illustrated in Fig. 6 presents to the user the suggested returned tables, from which d.e user 
„,ay select by adding selected tables to the right-hand work area from the left-hand s.de of 
thescreen. It should be noted that in the event the user is not satisfied with the 
pxesented/retrieved tables, the user may elect to refine the query, whereby the user w.11 be 
presented with the expanded query screen, illustrated in Fig. 5. or, alternatively, the user 
„ays«bmitanew query by going back to the user query input screen, in the event the user 

wishes to continue, the method further includes the step 256 of joining matching attnbutes 
between selected retrieved tables. In the event *e user does not wish to proceed wa* *e 
query utilizing one or more of the automatically joined matching attributes, the method 
includes the step 260 of deleting appropriate joins, as selected by the user. ^ user may 
also add joins manually by clicking and dragging the mouse between attributes in different 
tables. In this regard, the method may further include the step 264 of joining selected 
attributes, as selected by the user. The method of the present invention finlher allows 
constraints to be added on specific attributes by clicking the attribute so that it appears m 
the bottom portion of the query generating screen, and allowing the user to fill m the 
constraint. In this regard, the method includes the step 268 of constraining selected 
attributes in accordance with the user's request Thereafter, the method includes the step 
272 of generating an SQL query based upon the selected tables, constraints and joms. Such 
SQL code may be presented to the user at step 276. Once the query is built, the user may 
view the SQL code and submit the query to the servlet. which passes the query to the 
„.ediator which is adapted to retrieving data from the specified databases in accord^- 
with the SQL query. In this regard, the method includes the step 280 of receiving the SQL 
query, the step 284 of retrieving data from the appropriate target databases in accor^ce 
with the SQL query, the step 288 of formatting retrieved data and finally, the step 292 of 
presenting the formatted retrieved data to the user. 

AS a result, the present invention is particularly useftil in aiding users in accessing 
data from distributed, structured databases, whereby users need not know the structure or 
even the existence of the databases needed to complete their queries. When queo'mg ^e 
sy^em. users need not know of the existence of relevant data sources currently availab^ m 
the system, need not understand the schemae of the databases, need not know SQL, and are 
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10 



15 



• «,..ri«fi usine drop-down menus. Rather, users may enter an 
not limited to formatting queries using drop- venerated by the 

1 . fi^m wnonvms and related terms automatically generaieu y 
unstructured query, select from synonyms an 

to expand the user's initial query, and then generate a pictonal query usmg 
system to expand the us submitting this query, 

tables the querying system suggests a5 relevant. After forming 

users are presented with the corresponding data from actual 

T^e foregoing description of the present invention has been P-ented 
ofinustrationanddescHption. Furthermore, the description is notm^^^^^^^ 
invention to the formdisclosedherein.Consequently,variatioi..^ 
commensurate v^th the above teachings, and the sldU or loiowledge o^^^ 

^ihin the scope of the present invention. T.e embodiments <i-nbed her^nab^^^^^^ 
.^er intended toexplainbestmodes^ownforpracticingthemve^^^^^ 

others sldlled in the art to utilize the invention in such, or other. -^0^-^ - 
various modifications required by the particular applications or uses P^^^^ 
invention. U is intended that the appended claims be const^ed to mclude altemati 

embodiments to the extern permitted by the prior art. 
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CLAIMS 

What is claimed is : 

1 . A method of obtaining inforaiation fix>m a plurality of distributed databases, 

comprising the steps of: 
5 receiving a first query firom a first user; 

processing the first queiy to identify a plurality of key terms in the first 
query, the plurality of key terms comprising at least ortfe of a first noun and a first noun 
phrase; 

displaying to the first user an expanded query including at least a plurality of 

10 returned key terms; 

receiving fi-om the first user a plurality of select key terms, the plurality of 
select key terms being selected from the plurality of returned key terms by the first user; 

processing the expanded query to retrieve a plurality of attributes and a 
plurality of table names, wherein the plurality of attributes corresponds to the plurality of 
15 table names, wherein the plurality of table names correspond to a plurality of tables iii the 

plurality of distributed databases; 

displaying to the first user the plurality of table names and the plurality of 

attributes; 

receiving from the first user a final query, the final query including a 
20 plurality of select tables, the plurality of select tables being selected firom the plurality of 

table names by the first user; 

processing the final query to generate a first SQL query corresponding to the 

final query; and 

returning to the first user a first data result set based on the first SQL query. 
25 2. A method as claimed in Claim 1 , wherein the first query is an unstructured 

query. 

3 . A method as claimed in Claim 1 , wherein said step of processing the first 
query comprises the step of stemming a first of the plurality of key terms. 

4. A method as claimed in Claim 1 , further comprising the step of: 

30 identifying at least one of a first verb and a first data item within the first quety. 
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5 A -neU-od .s cl.im«i u, Claim 1. wherein «,d ..ep of P-ocssmg *e fts. 
^ comprises *e s.ep of id»«^in. - iea. a s,„o„,m of a firs, of *e p.^..y o, 
key terms. 

6 A method as claimed to Claim l.fi«hercompdsin8tl.e step of. 
^jcit^ a. leas, each the plurality of sel«:t key tentts to produce a rank of 

terms; and 

storing the rank of terms for the first user. 
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7. A tnethod for obtaining data fiom a plurality of distributed databases, said 

method comprising the steps of: 

receiving a first query from a first user, and 

processing the first query to search a plurality of directories corresponding to 
5 the plurality of distributed databases to retrieve a plurality of database structures, wherein 
the plurality of database structures correspond to the first query. 

8. A method as claimed in Claim 7, wherein the plurality of database structures 
comprises a plurality of table names and attributes. 

9. A method as claimed in Qaim 7. wherein said processing step comprises the 

10 steps of: 

extracting from the first query at least one of a first noun, a first noun phrase, 

a first verb and a first number; and 

searching the plurality of directories for at least one of the furst noun, the first 

noun phrase, the first verb and the first number. 
15 10. A method as claimed in Claim 7, wherein said processing step comprises the 

steps of: 

stemming at least a first term relating to Ae first query to produce a first 

stemmed term; and 

searching the plurality of directories for the first stemmed term. 
20 1 1. A method as claimed in Claim 7, wherein said processing step comprises the 

steps of: 

retrieving for a first synonym of a first term relating to the first query; and 
searching the plurality of directories for at least one of the first term and the 

first synonym. 

25 12. A method as claimed in Claim 7, wherem said processing step comprises the 

steps of: 

retrieving at least a first related term corresponding to a first term, the first 

term being related to the first query; 

receiving from the first user at least a first selected term corresponding to at 
30 least one of the first relevant term and the first terni of the first query; and 

searching the plurality of directories for at least the first selected term, 
wherein a first of the plurality of directories includes the first selected term. 
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13. A »nethod as claimed in Claim 12. further comprising the step of: 

generating a first SQL query from at least the first selected term and the first 

of the plurality of directories; 

retrieving from a first of the plurality of distributed databases corresponding 
to the first of the plurality of directories first data, wherein the first data is displayable to the 
first user. 
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FIG. 9B 
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FIG. 9C 
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